Multifunctional Computational Lexicon of Contemporary Portuguese: An Available Resource for Multitype Applications
نویسندگان
چکیده
This paper presents some aspects of the first Portuguese frequency lexicon extracted from a corpus of large dimensions. The Multifunctional Computational Lexicon of Contemporary Portuguese (henceforth MCL) rised from the necessity of filling a gap existent in the studies of the contemporary Portuguese. Until recently, the frequency lexicons of Portuguese were of very small dimensions, such as Português Fundamental, which is constituted by 2.217 words extracted from a 700.000 word corpus and the Frequency Dictionary of Portuguese Words based on a literary corpus of 500.000 words. We describe here the main steps taken for collecting the lexical and frequency data and some of the major problems that arouse in the process. The resulting lexicon is a freely available reliable resource for several types of applications.
منابع مشابه
Some Language Resources and Tools for Computational Processing of Portuguese at INESC
In the last few years automatic processing tools and studies based on corpora have became of a great importance for the community. The possibility of evaluating and developing such tools and studies depends on the availability of language resources. For the Portuguese language in its several national varieties these resources are not enough to meet the community needs. In this paper some valuab...
متن کاملExtending NomLex-PT using AnCora-Nom
This work describes how we used AnCora-Nom, a Spanish nominalization lexicon, to extend NomLex-PT, a lexical resource for Portuguese, originally based on the English NomLex lexicon and fully integrated to OpenWordNet-PT, our freely available Portuguese WordNet. The complete Spanish lexicon, which contains 1,655 entries, was translated to Portuguese and then compared to our previous data. Furthe...
متن کاملNomLex-PT: A Lexicon of Portuguese Nominalizations
This paper presents NomLex-PT, a lexical resource describing Portuguese nominalizations. NomLex-PT connects verbs to their nominalizations, thereby enabling NLP systems to observe the potential semantic relationships between the two words when analysing a text. NomLex-PT is freely available and encoded in RDF for easy integration with other resources. Most notably, we have integrated NomLex-PT ...
متن کاملConstructing an Ontology of Japanese Lexical Properties: Specifying its Property Structures and Lexical Entries
Regarding the construction of an ontology of Japanese lexical properties (JLP-O) as fundamental in terms of establishing a conceptual framework to guide and facilitate the construction of a large-scale lexical resource (LR) database of the Japanese lexicon, this paper primarily focuses on two major concerns for the construction of the JLP-O. The first is to map out and appropriately structure t...
متن کاملThe Brazilian Portuguese Lexicon: An Instrument for Psycholinguistic Research
In this article, we present the Brazilian Portuguese Lexicon, a new word-based corpus for psycholinguistic and computational linguistic research in Brazilian Portuguese. We describe the corpus development, the specific characteristics on the internet site and database for user access. We also perform distributional analyses of the corpus and comparisons to other current databases. Our main obje...
متن کامل